Univariate TS Models (ARIMA/SARIMA)

Code
canada <- read.csv("data/clean_data/24monthly_canada_freight.csv")
canada <- canada[,c("Date","Value")]
canada_ts <- ts(canada$Value, start = decimal_date(as.Date("2006-01-01", format = "%Y-%m-%d")), end = decimal_date(as.Date("2023-01-01", format = "%Y-%m-%d")), frequency = 12)


employment <- read.csv("data/clean_data/32monthly_employment.csv")
employment <- employment[,c("Date","Transportation.Employment...Air.Transportation")]
employment_ts <- ts(employment$Transportation.Employment...Air.Transportation, start = decimal_date(as.Date("2005-01-01", format = "%Y-%m-%d")), end = decimal_date(as.Date("2024-01-01", format = "%Y-%m-%d")), frequency = 12)


tsi <- read.csv("data/clean_data/35monthly_TSI.csv")
tsi_ts <- ts(tsi$Transportation.Services.Index...Freight, start = decimal_date(as.Date("2000-01-01", format = "%Y-%m-%d")), end = decimal_date(as.Date("2023-01-01", format = "%Y-%m-%d")), frequency = 12)


air <- read.csv("data/clean_data/33revenue.csv")
air <- air[air$Mode=="Air carrier, domestic",c("Year","Value")]
air <- air[order(air$Year), ]
air_ts <- ts(air$Value, start = decimal_date(as.Date("2000-01-01", format = "%Y-%m-%d")), end = decimal_date(as.Date("2021-01-01", format = "%Y-%m-%d")), frequency = 1)


ups <- getSymbols("UPS",auto.assign = FALSE, from = "2017-01-01", to = "2024-01-01",src="yahoo") 
ups=data.frame(ups)
ups <- data.frame(ups,rownames(ups))
colnames(ups)[7] = "date"
ups$date<-as.Date(ups$date,"%Y-%m-%d")
ups_ts <- ts(ups$UPS.Adjusted, start = decimal_date(as.Date("2017-01-03", format = "%Y-%m-%d")), frequency = 365.25)

ARIMA

ACF & PACF Plots

Code
plot1<-ggAcf(canada_ts)+ggtitle("U.S.-Canada Freight Value ACF") + theme_bw()
plot2<- ggPacf(canada_ts)+theme_bw()+ggtitle("U.S.-Canada Freight Value PACF")

grid.arrange(plot1, plot2,nrow=2)

Based on the ACF and PACF plots, we observe strong correlations at the beginning lags, gradually decreasing but remaining relatively strong until lag 12 in the ACF plot. In contrast, the PACF plot also exhibits strong correlation at lag 1, with moderate correlation extending until lag 13. The presence of significant correlations in both plots suggests that the time series data is non-stationary.

Code
plot1<-ggAcf(employment_ts)+ggtitle("U.S. Air Transportation Employment ACF") + theme_bw()
plot2<- ggPacf(employment_ts)+theme_bw()+ggtitle("U.S. Air Transportation Employment PACF")

grid.arrange(plot1, plot2,nrow=2)

Based on the ACF plot, there is strong correlation observed from lag 1 to lag 16, with the correlation gradually decreasing but remaining relatively strong throughout. Similarly, the PACF plot shows strong correlation at lag 1 and lag 2. Given the presence of significant correlations in both plots, we can infer that the series is likely non-stationary.

Code
plot1<-ggAcf(tsi_ts)+ggtitle("U.S. Freight Transportation Services Index ACF") + theme_bw()
plot2<- ggPacf(tsi_ts)+theme_bw()+ggtitle("U.S. Freight Transportation Services Index PACF")

grid.arrange(plot1, plot2,nrow=2)

Based on the ACF plot, there is a strong correlation observed at lag 1, with the correlation slightly decreasing but remaining relatively strong until the end of the plot. Similarly, the PACF plot shows strong correlation at lag 1. Given the presence of significant correlations in both plots, we can infer that the series is likely non-stationary.

Code
plot1<-ggAcf(air_ts)+ggtitle("U.S. Domestic Air Carrier Average Freight Revenue ACF") + theme_bw()
plot2<- ggPacf(air_ts)+theme_bw()+ggtitle("U.S. Domestic Air Carrier Average Freight Revenue PACF")

grid.arrange(plot1, plot2,nrow=2)

Based on the description provided, the ACF plot demonstrates strong correlation at lag 1 and moderate correlation at lag 2 and lag 3. Additionally, the PACF plot exhibits strong correlation only at lag 1. Given the presence of significant correlations in both plots, particularly at lag 1, and moderate correlation at subsequent lags, we can infer that the series is likely non-stationary.

Code
plot1<-ggAcf(ups_ts, lag=100)+ggtitle("UPS Stock Price ACF") + theme_bw()
plot2<- ggPacf(ups_ts, lag=100)+theme_bw()+ggtitle("UPS Stock Price PACF")

grid.arrange(plot1, plot2,nrow=2)

Based on the provided information, it appears that the ACF plot shows strong correlation beginning at lag 1 and then slightly decreasing but remaining strong until the end. Additionally, the PACF plot exhibits strong correlation only at lag 1. Given the presence of significant correlations in both plots, particularly at lag 1, and the sustained autocorrelation observed in the ACF plot, we can infer that the series is likely non-stationary.

Adjusted Dickey-Fuller Test On Original Data

Code
tseries::adf.test(canada_ts)

    Augmented Dickey-Fuller Test

data:  canada_ts
Dickey-Fuller = -3.0511, Lag order = 5, p-value = 0.1356
alternative hypothesis: stationary

With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating non-stationarity.

Code
tseries::adf.test(employment_ts)

    Augmented Dickey-Fuller Test

data:  employment_ts
Dickey-Fuller = -3.1311, Lag order = 6, p-value = 0.1008
alternative hypothesis: stationary

With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating non-stationarity.

Code
tseries::adf.test(tsi_ts)

    Augmented Dickey-Fuller Test

data:  tsi_ts
Dickey-Fuller = -2.4166, Lag order = 6, p-value = 0.4005
alternative hypothesis: stationary

With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating non-stationarity.

Code
tseries::adf.test(air_ts)

    Augmented Dickey-Fuller Test

data:  air_ts
Dickey-Fuller = -0.85187, Lag order = 2, p-value = 0.9424
alternative hypothesis: stationary

With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating non-stationarity.

Code
tseries::adf.test(ups_ts)

    Augmented Dickey-Fuller Test

data:  ups_ts
Dickey-Fuller = -2.2106, Lag order = 12, p-value = 0.4892
alternative hypothesis: stationary

With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating non-stationarity.

Differencing

Code
require(gridExtra)
fit = lm(canada_ts~time(canada_ts), na.action=NULL)

plot1<-autoplot(diff(canada_ts), main="First Difference") +theme_bw()
plot2 <- ggAcf(diff(canada_ts), 48) + ggtitle("First Difference Data")+theme_bw()

grid.arrange(plot1, plot2,nrow=2)

Code
canada_ts %>% diff() %>% ggtsdisplay()

The differenced data exhibits greater stationarity compared to the originaldata. This improvement is evident in the ACF plot of the differenced data, which shows a significant drop-off, indicating a substantial reduction in autocorrelation beyond those lags. However, despite the improvement, the first order difference still shows some seasonality in the plot, suggesting that further differencing may be necessary to fully address the seasonality present in the data.

Code
plot1 <- ggAcf(diff(canada_ts), 48) + ggtitle("First Difference Data")+theme_bw()
plot2 <- ggAcf(diff(diff(canada_ts)), 48) + ggtitle("Second Differenced Data") + theme_bw()

grid.arrange(plot1, plot2, nrow=2)

Code
canada_ts %>% diff() %>% diff() %>% ggtsdisplay()

After applying a second-order difference, the ACF plot shows even stronger autocorrelation. The second differenced ACF plot shows that the data be over differenced. Consequently, I have opted to retain the first-order difference data, as it achieves a satisfactory level of stationarity.

Code
require(gridExtra)
fit1 = lm(employment_ts~time(employment_ts), na.action=NULL)

plot1<-autoplot(diff(employment_ts), main="First Difference") +theme_bw()
plot2 <- ggAcf(diff(employment_ts), 48) + ggtitle("First Difference Data")+theme_bw()

grid.arrange(plot1, plot2,nrow=2)

Code
employment_ts %>% diff() %>% ggtsdisplay()

The differenced data exhibits greater stationarity compared to the original data. This improvement is evident in the ACF plot of the differenced data, which shows a significant drop-off, indicating a substantial reduction in autocorrelation beyond those lags. However, despite the improvement, the first order difference still shows some seasonality in the plot, suggesting that further differencing may be necessary to fully address the seasonality present in the data.

Code
plot1 <- ggAcf(diff(employment_ts), 48) + ggtitle("First Difference Data")+theme_bw()
plot2 <- ggAcf(diff(diff(employment_ts)), 48) + ggtitle("Second Differenced Data") + theme_bw()

grid.arrange(plot1, plot2, nrow=2)

Code
employment_ts %>% diff() %>% diff() %>% ggtsdisplay()

The plot above clearly demonstrates that Second Order Differencing effectively renders the data stationary, which is a crucial prerequisite for accurate modeling.

Code
require(gridExtra)
fit2 = lm(tsi_ts~time(tsi_ts), na.action=NULL)

plot1<- autoplot(diff(tsi_ts), main="First Difference") +theme_bw()
plot2 <- ggAcf(diff(tsi_ts), 48) + ggtitle("First Difference Data")+theme_bw()

grid.arrange(plot1, plot2,nrow=2)

Code
tsi_ts %>% diff() %>% ggtsdisplay()

From both the original plots and the ACF plots, it’s evident that the differenced data exhibits greater stationarity compared to the original data. The ACF plot of the differenced data shows a significant drop-off, indicating a lack of autocorrelation beyond those lags, which is characteristic of stationary data.

Code
require(gridExtra)
fit3 = lm(air_ts~time(air_ts), na.action=NULL)

plot1<-autoplot(diff(air_ts), main="First Difference") +theme_bw()
plot2 <- ggAcf(diff(air_ts), 48) + ggtitle("First Difference Data")+theme_bw()

grid.arrange(plot1, plot2,nrow=2)

Code
air_ts %>% diff() %>% ggtsdisplay()

The differenced data exhibit greater stationarity compared to the original data. This improvement is evident in the ACF plot of the differenced data, which show a significant drop-off, indicating a lack of autocorrelation beyond those lags. This drop-off is characteristic of stationary data.

Code
require(gridExtra)
fit4 = lm(ups_ts~time(ups_ts), na.action=NULL)

plot1<-autoplot(diff(ups_ts), main="First Difference") +theme_bw()
plot2 <- ggAcf(diff(ups_ts), 48) + ggtitle("First Difference Data")+theme_bw()

grid.arrange(plot1, plot2,nrow=2)

Code
ups_ts %>% diff() %>% ggtsdisplay(lag=48)

The differenced data exhibits greater stationarity compared to the original. This improvement is evident in the ACF plot of the differenced data, which shows a significant drop-off, indicating a substantial reduction in autocorrelation beyond those lags. However, despite the improvement, the First Order Difference still shows some seasonality in the plot, suggesting that further differencing may be necessary to fully address the seasonality present in the data.

Code
plot1 <- ggAcf(diff(ups_ts), 48) + ggtitle("First Difference Data")+theme_bw()
plot2 <- ggAcf(diff(diff(ups_ts)), 48) + ggtitle("Second Differenced Data") + theme_bw()

grid.arrange(plot1, plot2, nrow=2)

Code
ups_ts %>% diff() %>% diff() %>% ggtsdisplay(lag=48)

After performing a second-order difference, strong negative autocorrelation is observed at lag 1 in the ACF plot. The second differenced ACF plot shows that the data be over differenced. Therefore, I have decided to retain the first-order difference data, as it achieves a satisfactory level of stationarity.

Adjusted Dickey-Fuller Test On Differencing Data

Code
tseries::adf.test(diff(canada_ts))

    Augmented Dickey-Fuller Test

data:  diff(canada_ts)
Dickey-Fuller = -6.549, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary

With a p-value smaller than 0.05, there is sufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating stationarity.

Code
tseries::adf.test(diff(diff(employment_ts)))

    Augmented Dickey-Fuller Test

data:  diff(diff(employment_ts))
Dickey-Fuller = -9.6886, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

With a p-value smaller than 0.05, there is sufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating stationarity.

Code
tseries::adf.test(diff(tsi_ts))

    Augmented Dickey-Fuller Test

data:  diff(tsi_ts)
Dickey-Fuller = -5.025, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

With a p-value smaller than 0.05, there is sufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating stationarity.

Code
tseries::adf.test(diff(air_ts))

    Augmented Dickey-Fuller Test

data:  diff(air_ts)
Dickey-Fuller = -2.1674, Lag order = 2, p-value = 0.5086
alternative hypothesis: stationary

With a p-value higher than 0.05, there is insufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is non-stationary. This conclusion againest with the earlier analysis, where lack of autocorrelation was observed in the ACF and PACF plots, indicating the data almost stationary.

Code
tseries::adf.test(diff(ups_ts))

    Augmented Dickey-Fuller Test

data:  diff(ups_ts)
Dickey-Fuller = -11.498, Lag order = 12, p-value = 0.01
alternative hypothesis: stationary

With a p-value smaller than 0.05, there is sufficient evidence to reject the null hypothesis at the 5% significance level. Consequently, we can conclude that our series is stationary. This conclusion aligns with the earlier analysis, where strong autocorrelation was observed in the ACF and PACF plots, indicating stationarity.

ARIMA(p,d,q)

In this section, we will identify potential values for the parameters p and q based on the significant lags observed in the PACF and ACF plots of the original data, respectively. Specifically, we will consider the most significant lags from the PACF plot for the value of p, and from the ACF plot for the value of q.

p = 0,1,2 d = 0,1,2, q = 0,1,2,3

Code
## empty list to store model fits

ARMA_res <- list()

## set counter
cc <-1

## loop over AR
for(p in 0:2){
  ## loop over MA
  for(q in 0:3){
    ## loop over I
    for(d in 0:2){
      ARMA_res[[cc]]<-Arima(y=canada_ts,order = c(p,d,q), include.drift = TRUE)
      cc<- cc+1
    }
  }
}

## get AIC values for model evaluation
ARMA_AIC<-sapply(ARMA_res,function(x) x$aic)
ARMA_res[[which(ARMA_AIC == min(ARMA_AIC))]]
Series: canada_ts 
ARIMA(2,2,3) 

Coefficients:
          ar1      ar2     ma1      ma2     ma3
      -1.7310  -0.9992  0.7457  -0.7457  -1.000
s.e.   0.0025   0.0013  0.0251   0.0242   0.026

sigma^2 = 9.987:  log likelihood = -525.66
AIC=1063.32   AICc=1063.74   BIC=1083.19
Code
## get BIC values for model evaluation
ARMA_BIC<-sapply(ARMA_res,function(x) x$bic)
ARMA_res[[which(ARMA_BIC == min(ARMA_BIC))]]
Series: canada_ts 
ARIMA(2,2,3) 

Coefficients:
          ar1      ar2     ma1      ma2     ma3
      -1.7310  -0.9992  0.7457  -0.7457  -1.000
s.e.   0.0025   0.0013  0.0251   0.0242   0.026

sigma^2 = 9.987:  log likelihood = -525.66
AIC=1063.32   AICc=1063.74   BIC=1083.19
Code
## get AICC values for model evaluation
ARMA_AICC<-sapply(ARMA_res,function(x) x$aicc)
ARMA_res[[which(ARMA_AICC == min(ARMA_AICC))]]
Series: canada_ts 
ARIMA(2,2,3) 

Coefficients:
          ar1      ar2     ma1      ma2     ma3
      -1.7310  -0.9992  0.7457  -0.7457  -1.000
s.e.   0.0025   0.0013  0.0251   0.0242   0.026

sigma^2 = 9.987:  log likelihood = -525.66
AIC=1063.32   AICc=1063.74   BIC=1083.19

The model with the lowest AIC and BIC is ARIMA(2,2,3).

p = 0,1,2, d = 0,1,2, q = 0,1,2,3

Code
## empty list to store model fits

ARMA_res <- list()

## set counter
cc <-1

## loop over AR
for(p in 0:2){
  ## loop over MA
  for(q in 0:3){
    ## loop over I
    for(d in 0:2){
    
      ARMA_res[[cc]]<-Arima(y=employment_ts,order = c(p,d,q), include.drift = TRUE)
      cc<- cc+1
    }
  }
}

## get AIC values for model evaluation
ARMA_AIC<-sapply(ARMA_res,function(x) x$aic)
ARMA_res[[which(ARMA_AIC == min(ARMA_AIC))]]
Series: employment_ts 
ARIMA(2,2,3) 

Coefficients:
         ar1      ar2      ma1     ma2      ma3
      0.4253  -0.5218  -0.9902  0.3708  -0.3806
s.e.  0.1897   0.1252   0.1912  0.1990   0.1703

sigma^2 = 40180919:  log likelihood = -2309.48
AIC=4630.96   AICc=4631.35   BIC=4651.51
Code
## get BIC values for model evaluation
ARMA_BIC<-sapply(ARMA_res,function(x) x$bic)
ARMA_res[[which(ARMA_BIC == min(ARMA_BIC))]]
Series: employment_ts 
ARIMA(2,2,1) 

Coefficients:
         ar1      ar2      ma1
      0.4688  -0.2364  -1.0000
s.e.  0.0644   0.0643   0.0143

sigma^2 = 40643462:  log likelihood = -2311.73
AIC=4631.45   AICc=4631.63   BIC=4645.15
Code
## get AICC values for model evaluation
ARMA_AICC<-sapply(ARMA_res,function(x) x$aicc)
ARMA_res[[which(ARMA_AICC == min(ARMA_AICC))]]
Series: employment_ts 
ARIMA(2,2,3) 

Coefficients:
         ar1      ar2      ma1     ma2      ma3
      0.4253  -0.5218  -0.9902  0.3708  -0.3806
s.e.  0.1897   0.1252   0.1912  0.1990   0.1703

sigma^2 = 40180919:  log likelihood = -2309.48
AIC=4630.96   AICc=4631.35   BIC=4651.51

The model with the lowest AIC and BIC is ARIMA(2,2,3).

p = 0,1, d = 0,1, q = 0,1,2,3

Code
## empty list to store model fits

ARMA_res <- list()

## set counter
cc <-1

## loop over AR
for(p in 0:1){
  ## loop over MA
  for(q in 0:3){
    ## loop over I
    for(d in 0:1){
    
      ARMA_res[[cc]]<-Arima(y=tsi_ts,order = c(p,d,q), include.drift = TRUE)
      cc<- cc+1
    }
  }
}

## get AIC values for model evaluation
ARMA_AIC<-sapply(ARMA_res,function(x) x$aic)
ARMA_res[[which(ARMA_AIC == min(ARMA_AIC))]]
Series: tsi_ts 
ARIMA(0,1,3) with drift 

Coefficients:
          ma1      ma2     ma3   drift
      -0.0700  -0.0927  0.1035  0.1127
s.e.   0.0608   0.0580  0.0578  0.0837

sigma^2 = 2.217:  log likelihood = -499.51
AIC=1009.03   AICc=1009.25   BIC=1027.13
Code
## get BIC values for model evaluation
ARMA_BIC<-sapply(ARMA_res,function(x) x$bic)
ARMA_res[[which(ARMA_BIC == min(ARMA_BIC))]]
Series: tsi_ts 
ARIMA(0,1,0) with drift 

Coefficients:
       drift
      0.1141
s.e.  0.0901

sigma^2 = 2.246:  log likelihood = -502.8
AIC=1009.61   AICc=1009.65   BIC=1016.85
Code
## get AICC values for model evaluation
ARMA_AICC<-sapply(ARMA_res,function(x) x$aicc)
ARMA_res[[which(ARMA_AICC == min(ARMA_AICC))]]
Series: tsi_ts 
ARIMA(0,1,3) with drift 

Coefficients:
          ma1      ma2     ma3   drift
      -0.0700  -0.0927  0.1035  0.1127
s.e.   0.0608   0.0580  0.0578  0.0837

sigma^2 = 2.217:  log likelihood = -499.51
AIC=1009.03   AICc=1009.25   BIC=1027.13

The model with the lowest AIC and BIC is ARIMA(0,1,3).

p = 0,1, d = 0,1, q = 0,1,2

Code
## empty list to store model fits

ARMA_res <- list()

## set counter
cc <-1

## loop over AR
for(p in 0:1){
  ## loop over MA
  for(q in 0:2){
    ## loop over I
    for(d in 0:1){
    
      ARMA_res[[cc]]<-Arima(y=air_ts,order = c(p,d,q), include.drift = TRUE)
      cc<- cc+1
    }
  }
}

## get AIC values for model evaluation
ARMA_AIC<-sapply(ARMA_res,function(x) x$aic)
ARMA_res[[which(ARMA_AIC == min(ARMA_AIC))]]
Series: air_ts 
ARIMA(0,1,1) with drift 

Coefficients:
         ma1    drift
      1.0000  -0.1510
s.e.  0.2724   5.1467

sigma^2 = 161:  log likelihood = -83.65
AIC=173.3   AICc=174.71   BIC=176.43
Code
## get BIC values for model evaluation
ARMA_BIC<-sapply(ARMA_res,function(x) x$bic)
ARMA_res[[which(ARMA_BIC == min(ARMA_BIC))]]
Series: air_ts 
ARIMA(0,1,1) with drift 

Coefficients:
         ma1    drift
      1.0000  -0.1510
s.e.  0.2724   5.1467

sigma^2 = 161:  log likelihood = -83.65
AIC=173.3   AICc=174.71   BIC=176.43
Code
## get AICC values for model evaluation
ARMA_AICC<-sapply(ARMA_res,function(x) x$aicc)
ARMA_res[[which(ARMA_AICC == min(ARMA_AICC))]]
Series: air_ts 
ARIMA(0,1,1) with drift 

Coefficients:
         ma1    drift
      1.0000  -0.1510
s.e.  0.2724   5.1467

sigma^2 = 161:  log likelihood = -83.65
AIC=173.3   AICc=174.71   BIC=176.43

The model with the lowest AIC and BIC is ARIMA(0,1,1).

p = 0,1, d = 0,1,2, q = 0,1,2

Code
## empty list to store model fits

ARMA_res <- list()

## set counter
cc <-1

## loop over AR
for(p in 0:1){
  ## loop over MA
  for(q in 0:2){
    ## loop over I
    for(d in 0:2){
    
      ARMA_res[[cc]]<-Arima(y=ups_ts,order = c(p,d,q), include.drift = TRUE)
      cc<- cc+1
    }
  }
}

## get AIC values for model evaluation
ARMA_AIC<-sapply(ARMA_res,function(x) x$aic)
ARMA_res[[which(ARMA_AIC == min(ARMA_AIC))]]
Series: ups_ts 
ARIMA(0,1,0) with drift 

Coefficients:
       drift
      0.0366
s.e.  0.0557

sigma^2 = 5.465:  log likelihood = -3989.16
AIC=7982.32   AICc=7982.33   BIC=7993.27
Code
## get BIC values for model evaluation
ARMA_BIC<-sapply(ARMA_res,function(x) x$bic)
ARMA_res[[which(ARMA_BIC == min(ARMA_BIC))]]
Series: ups_ts 
ARIMA(0,1,0) with drift 

Coefficients:
       drift
      0.0366
s.e.  0.0557

sigma^2 = 5.465:  log likelihood = -3989.16
AIC=7982.32   AICc=7982.33   BIC=7993.27
Code
## get AICC values for model evaluation
ARMA_AICC<-sapply(ARMA_res,function(x) x$aicc)
ARMA_res[[which(ARMA_AICC == min(ARMA_AICC))]]
Series: ups_ts 
ARIMA(0,1,0) with drift 

Coefficients:
       drift
      0.0366
s.e.  0.0557

sigma^2 = 5.465:  log likelihood = -3989.16
AIC=7982.32   AICc=7982.33   BIC=7993.27

The model with the lowest AIC and BIC is ARIMA(0,1,0).

Fit Best ARIMA(p,d,q) & Diagnostics

Code
ARMA_res_canada<-Arima(y=canada_ts,order = c(2,2,3), include.drift = TRUE)
summary(ARMA_res_canada)
Series: canada_ts 
ARIMA(2,2,3) 

Coefficients:
          ar1      ar2     ma1      ma2     ma3
      -1.7310  -0.9992  0.7457  -0.7457  -1.000
s.e.   0.0025   0.0013  0.0251   0.0242   0.026

sigma^2 = 9.987:  log likelihood = -525.66
AIC=1063.32   AICc=1063.74   BIC=1083.19

Training set error measures:
                      ME     RMSE      MAE        MPE     MAPE      MASE
Training set -0.02179075 3.105814 2.276102 -0.2721476 4.798624 0.3998011
                    ACF1
Training set -0.01487147

Equation: \(x_t= 0.269*x_{t-1} +3.4612*x_{t-2} +0.2674*x_{t-3} +0.9992*x_{t-4}+w_t +0.7457*w_{t-1} -0.7457*w_{t-2} -w_{t-3}\)

Code
output <- capture.output(sarima(canada_ts,2,2,3))

Code
cat(output[120:131], output[length(output)], sep = "\n") 
Coefficients: 
    Estimate     SE   t.value p.value
ar1  -1.7310 0.0025 -701.3924       0
ar2  -0.9992 0.0013 -759.3772       0
ma1   0.7457 0.0251   29.6534       0
ma2  -0.7457 0.0242  -30.8026       0
ma3  -1.0000 0.0260  -38.3988       0

sigma^2 estimated as 9.741094 on 198 degrees of freedom 
 
AIC = 5.238006  AICc = 5.239506  BIC = 5.335933 
 
 

Upon inspection of the time plot of the standardized residuals above, no discernible patterns are evident. However, the ACF plot of the standardized residuals indicates some remaining correlation. The normal Q-Q plot of the residuals suggests that the assumption of normality is reasonable, with the exception of potential outliers. The results of the Ljung-Box test reveal values below the 0.05 (5% significance) threshold, indicating the presence of some significant correlation remaining. Despite these findings, it appears that the model has been improved.

Code
ARMA_res_employment<-Arima(y=employment_ts,order = c(2,2,3), include.drift = TRUE)
summary(ARMA_res_employment)
Series: employment_ts 
ARIMA(2,2,3) 

Coefficients:
         ar1      ar2      ma1     ma2      ma3
      0.4253  -0.5218  -0.9902  0.3708  -0.3806
s.e.  0.1897   0.1252   0.1912  0.1990   0.1703

sigma^2 = 40180919:  log likelihood = -2309.48
AIC=4630.96   AICc=4631.35   BIC=4651.51

Training set error measures:
                   ME     RMSE      MAE        MPE      MAPE      MASE
Training set 263.3125 6241.208 2809.261 0.05105339 0.6005098 0.1456793
                    ACF1
Training set 0.009897766

Equation: \(x_t= 2.8506*x_{t-1} -2.3724*x_{t-2} -0.6183*x_{t-3} -0.5218*x_{t-4} +w_t -0.9902*w_{t-1} +0.3708*w_{t-2} -0.3806*w_{t-3}\)

Code
output <- capture.output(sarima(employment_ts,2,2,3))

Code
cat(output[55:66], output[length(output)], sep = "\n") 
Coefficients: 
    Estimate     SE t.value p.value
ar1   0.4253 0.1897  2.2413  0.0260
ar2  -0.5218 0.1252 -4.1690  0.0000
ma1  -0.9902 0.1912 -5.1783  0.0000
ma2   0.3708 0.1990  1.8633  0.0637
ma3  -0.3806 0.1703 -2.2346  0.0264

sigma^2 estimated as 39293604 on 222 degrees of freedom 
 
AIC = 20.40072  AICc = 20.40192  BIC = 20.49125 
 
 

Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals no significant departures from the model assumptions, as no significant lags are observed. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. Furthermore, the Ljung-Box test yields favorable values exceeding 0.05, suggesting the absence of significant correlation and indicating a well-fitted model. Overall, these observations suggest that the model fits the data well.

Code
ARMA_res_tsi<-Arima(y=tsi_ts,order = c(0,1,3), include.drift = TRUE)
summary(ARMA_res_tsi)
Series: tsi_ts 
ARIMA(0,1,3) with drift 

Coefficients:
          ma1      ma2     ma3   drift
      -0.0700  -0.0927  0.1035  0.1127
s.e.   0.0608   0.0580  0.0578  0.0837

sigma^2 = 2.217:  log likelihood = -499.51
AIC=1009.03   AICc=1009.25   BIC=1027.13

Training set error measures:
                       ME     RMSE      MAE         MPE      MAPE      MASE
Training set 0.0007283871 1.475506 1.068367 -0.01205377 0.9267941 0.2634875
                    ACF1
Training set 0.004977272

Equation: \(x_t= x_{t-1} +w_t -0.07*w_{t-1} -0.0927*w_{t-2} +0.1035*w_{t-3} +0.1127\)

Code
output <- capture.output(sarima(tsi_ts,0,1,3))

Code
cat(output[21:31], output[length(output)], sep = "\n") 
Coefficients: 
         Estimate     SE t.value p.value
ma1       -0.0700 0.0608 -1.1505  0.2509
ma2       -0.0927 0.0580 -1.6003  0.1107
ma3        0.1035 0.0578  1.7915  0.0743
constant   0.1127 0.0837  1.3458  0.1795

sigma^2 estimated as 2.184965 on 272 degrees of freedom 
 
AIC = 3.655896  AICc = 3.656431  BIC = 3.721483 
 
 

Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals no significant departures from the model assumptions, as no significant lags are observed. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. Furthermore, the Ljung-Box test yields favorable values exceeding 0.05, suggesting the absence of significant correlation and indicating a well-fitted model. Overall, these observations suggest that the model fits the data well.

Code
ARMA_res_air<-Arima(y=air_ts,order = c(0,1,1), include.drift = TRUE)
summary(ARMA_res_air)
Series: air_ts 
ARIMA(0,1,1) with drift 

Coefficients:
         ma1    drift
      1.0000  -0.1510
s.e.  0.2724   5.1467

sigma^2 = 161:  log likelihood = -83.65
AIC=173.3   AICc=174.71   BIC=176.43

Training set error measures:
                    ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
Training set 0.2965719 11.79289 8.992931 0.1888945 9.662775 0.8915005 -0.264417

Equation: \(x_t= x_{t-1} +w_t +w_{t-1} -0.151\)

Code
output <- capture.output(sarima(air_ts,0,1,1))

Code
cat(output[30:38], output[length(output)], sep = "\n") 
Coefficients: 
         Estimate     SE t.value p.value
ma1         1.000 0.2724  3.6708  0.0016
constant   -0.151 5.1467 -0.0293  0.9769

sigma^2 estimated as 145.6943 on 19 degrees of freedom 
 
AIC = 8.252253  AICc = 8.283999  BIC = 8.40147 
 
 

Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals no significant departures from the model assumptions, as no significant lags are observed. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. Furthermore, the Ljung-Box test yields favorable values exceeding 0.05, suggesting the absence of significant correlation and indicating a well-fitted model. Overall, these observations suggest that the model fits the data well.

Code
ARMA_res_ups<-Arima(y=ups_ts,order = c(0,1,0), include.drift = TRUE)
summary(ARMA_res_ups)
Series: ups_ts 
ARIMA(0,1,0) with drift 

Coefficients:
       drift
      0.0366
s.e.  0.0557

sigma^2 = 5.465:  log likelihood = -3989.16
AIC=7982.32   AICc=7982.33   BIC=7993.27

Training set error measures:
                    ME     RMSE      MAE         MPE     MAPE       MASE
Training set 5.174e-05 2.336454 1.513771 -0.01543307 1.161839 0.04821099
                   ACF1
Training set 0.03201809

Equation: \(x_t= x_{t-1} + w_t +0.0366\)

Code
output <- capture.output(sarima(ups_ts,0,1,0))

Code
cat(output[11:18], output[length(output)], sep = "\n") 
Coefficients: 
         Estimate     SE t.value p.value
constant   0.0366 0.0557  0.6569  0.5113

sigma^2 estimated as 5.462118 on 1758 degrees of freedom 
 
AIC = 4.537988  AICc = 4.537989  BIC = 4.54421 
 
 

Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals no significant departures from the model assumptions, as no significant lags are observed. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. Furthermore, the Ljung-Box test yields favorable values exceeding 0.05, suggesting the absence of significant correlation and indicating a well-fitted model. Overall, these observations suggest that the model fits the data well.

Auto ARIMA

Code
auto.arima(canada_ts)
Series: canada_ts 
ARIMA(2,1,5)(1,0,0)[12] 

Coefficients:
          ar1      ar2     ma1     ma2     ma3      ma4     ma5    sar1
      -0.1410  -0.4216  0.1452  0.4440  0.0703  -0.3019  0.2020  0.5636
s.e.   0.1988   0.2128  0.1977  0.2187  0.0836   0.0907  0.1092  0.0718

sigma^2 = 8.695:  log likelihood = -508.85
AIC=1035.71   AICc=1036.63   BIC=1065.57

The best model provided by Auto ARIMA differs from the best model identified in the previous step. This discrepancy suggests that the Auto ARIMA model may offer a better fit compared to the previously chosen model, which was not optimal. It is plausible that this dataset exhibits strong seasonality, which ARIMA alone may struggle to capture adequately. Therefore, considering SARIMA models might be necessary to effectively model the seasonality present in the data.

Code
auto.arima(employment_ts)
Series: employment_ts 
ARIMA(0,1,1) 

Coefficients:
         ma1
      0.4204
s.e.  0.0536

sigma^2 = 41251784:  log likelihood = -2322.12
AIC=4648.25   AICc=4648.3   BIC=4655.11

The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.

Code
auto.arima(tsi_ts)
Series: tsi_ts 
ARIMA(3,1,0)(0,0,2)[12] with drift 

Coefficients:
          ar1      ar2     ar3     sma1     sma2   drift
      -0.0787  -0.0820  0.0791  -0.1773  -0.1487  0.1261
s.e.   0.0605   0.0614  0.0623   0.0634   0.0655  0.0560

sigma^2 = 2.129:  log likelihood = -493.41
AIC=1000.82   AICc=1001.24   BIC=1026.16

The best model provided by Auto ARIMA differs from the best model identified in the previous step. This discrepancy suggests that the Auto ARIMA model may offer a better fit compared to the previously chosen model, which was not optimal. It is plausible that this dataset exhibits strong seasonality, which ARIMA alone may struggle to capture adequately. Therefore, considering SARIMA models might be necessary to effectively model the seasonality present in the data.

Code
auto.arima(air_ts)
Series: air_ts 
ARIMA(0,1,0) 

sigma^2 = 213.5:  log likelihood = -86.12
AIC=174.23   AICc=174.44   BIC=175.28

The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.

Code
auto.arima(ups_ts)
Series: ups_ts 
ARIMA(0,1,0) 

sigma^2 = 5.463:  log likelihood = -3989.38
AIC=7980.75   AICc=7980.75   BIC=7986.22

The best model Auto ARIMA provide is the same with the best model from the step above.

Forecasting

Code
pred_canada <- forecast(ARMA_res_canada, 12)
autoplot(pred_canada, xlab = "Date", ylab = "Billions", colour = "blue")+ggtitle('U.S.-Canada Freight Value Forecast') +theme_bw()

The blue line represents the forecast, while the two shades of purple depict the confidence intervals, with the darker shade representing the 95% interval and the lighter shade representing the 5% interval. As the confidence interval widens, indicating greater uncertainty in the forecast, the shaded region becomes wider.

Examining the forecast plot, there is an initial surge in the monthly U.S.-Canada freight value, followed by a subsequent decline and another increase. Comparing the observed increasing seasonal fluctuations post-2020 to the forecasted fluctuations, it appears that the model accurately predicted the anticipated pattern.

Code
pred_employment <- forecast(ARMA_res_employment, 12)
autoplot(pred_employment, xlab = "Date", ylab = "Employment", colour = "blue")+ggtitle('U.S. Air Transportation Employment Forecast') +theme_bw()

The forecast plot illustrates a subtle decreasing trend in air transportation employment over the next 12 months. The colored bands depict the 95% confidence interval. However, the forecast appears overly smooth and lacks fluctuations. This suggests that the model may not adequately capture the inherent variability or fluctuations in the data.

Code
pred_tsi <- forecast(ARMA_res_tsi, 12)
autoplot(pred_tsi, xlab = "Date", ylab = "Transportation Services Index", colour = "blue")+ggtitle('U.S. Freight Transportation Services Index Forecast') +theme_bw()

The forecast plot indicates an ascending trend in the U.S. freight TSI over the ensuing 12 months. The colored bands delineate the 95% confidence interval. Nonetheless, the forecast appears excessively smooth, lacking the expected fluctuations. This suggests that the model might not adequately capture the inherent variability or fluctuations in the data.

Code
pred_air <- forecast(ARMA_res_air, 5)
autoplot(pred_air, xlab = "Date", ylab = "Freight revenue per ton-mile (current cents)", colour = "blue")+ggtitle('U.S. Domestic Air Carrier Average Freight Revenue Forecast') +theme_bw()

The forecast plot depicts a stagnant trend in the U.S. domestic air carrier average freight revenue over the upcoming five years. The colored bands represent the 95% confidence interval. However, the forecast appears excessively smooth, lacking any noticeable fluctuations. This suggests that the model might not adequately capture the inherent variability or fluctuations in the data.

Code
pred_ups <- forecast(ARMA_res_ups, 50)
autoplot(pred_ups, xlab = "Date", ylab = "Stock Price", colour = "blue")+ggtitle('UPS Stock Price Forecast') +theme_bw()

The forecast plot illustrates a rising trend in the UPS stock price over the forthcoming 50 days. The colored bands represent the 95% confidence interval. However, the forecast appears overly smooth, lacking noticeable fluctuations. This suggests that the model may not adequately capture the inherent variability or fluctuations in the stock price data.

ARIMA vs. Benchmark Methods

Code
autoplot(canada_ts, xlab = "Date", ylab = "Billions") +
  autolayer(meanf(canada_ts, h=12),
            series="Mean", PI=FALSE) +
  autolayer(naive(canada_ts, h=12),
            series="Naïve", PI=FALSE)+
  autolayer(rwf(canada_ts, h=12, drift=TRUE),
            series="Drift", PI=FALSE)+
   autolayer(pred_canada, 
            series="ARIMA",PI=FALSE) + ggtitle("U.S.-Canada Freight Value ARIMA vs. Benchmarks")+
  guides(colour=guide_legend(title="Forecast")) +theme_bw()

We observe that the ARIMA model best captures the trend, closely aligning with the observed data. The drift methods exhibit a comparable trajectory for the forecast. In contrast, the Mean and Naive benchmark models notably underperform, failing to accurately capture the underlying trend.

Code
# Compute accuracy
output1 <- accuracy(pred_canada)
output2 <- accuracy(meanf(canada_ts, h=12))
output3 <- accuracy(naive(canada_ts, h=12))
output4 <- accuracy(rwf(canada_ts, drift=TRUE, h=12))

output_list <- list(output1, output2, output3, output4)

# Put into df
output_df <- do.call(rbind, output_list)

# Set row names for the dataframe
row.names(output_df) <- c("ARIMA", "Meanf", "Naive", "Drift")

output_df
                 ME     RMSE      MAE        MPE      MAPE      MASE
ARIMA -2.179075e-02 3.105814 2.276102 -0.2721476  4.798624 0.3998011
Meanf -1.819683e-15 7.158385 5.209884 -2.1984228 11.014570 0.9151249
Naive  8.439239e-02 3.747662 2.676507 -0.1592665  5.543406 0.4701329
Drift -2.088947e-15 3.746711 2.677095 -0.3338399  5.547840 0.4702363
             ACF1
ARIMA -0.01487147
Meanf  0.85587296
Naive -0.27042733
Drift -0.27042733

Upon examining the accuracy metrics, it’s evident that the ARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.

Code
autoplot(employment_ts, xlab = "Date", ylab = "Employment") +
  autolayer(meanf(employment_ts, h=12),
            series="Mean", PI=FALSE) +
  autolayer(naive(employment_ts, h=12),
            series="Naïve", PI=FALSE)+
  autolayer(rwf(employment_ts, h=12, drift=TRUE),
            series="Drift", PI=FALSE)+
   autolayer(pred_employment, 
            series="ARIMA",PI=FALSE) + ggtitle("U.S. Air Transportation Employment ARIMA vs. Benchmarks")+
  guides(colour=guide_legend(title="Forecast")) +theme_bw()

We observe that the ARIMA, drift and Naive models exhibit a comparable trajectory for the forecast and best captures the trend, closely aligning with the observed data. In contrast, the Mean benchmark model notably underperform, failing to accurately capture the underlying trend.

Code
# Compute accuracy
output1 <- accuracy(pred_employment)
output2 <- accuracy(meanf(employment_ts, h=12))
output3 <- accuracy(naive(employment_ts, h=12))
output4 <- accuracy(rwf(employment_ts, drift=TRUE, h=12))

output_list <- list(output1, output2, output3, output4)

# Put into df
output_df <- do.call(rbind, output_list)

# Set row names for the dataframe
row.names(output_df) <- c("ARIMA", "Meanf", "Naive", "Drift")

output_df
                 ME      RMSE       MAE         MPE      MAPE      MASE
ARIMA  2.633125e+02  6241.208  2809.261  0.05105339 0.6005098 0.1456793
Meanf -2.235212e-11 25101.751 21628.241 -0.28679049 4.5989461 1.1215716
Naive -8.421053e+01  7028.726  3175.439 -0.02909029 0.6871776 0.1646681
Drift -4.059817e-12  7028.221  3184.303 -0.01132370 0.6889589 0.1651278
             ACF1
ARIMA 0.009897766
Meanf 0.957521645
Naive 0.376333238
Drift 0.376333238

Upon examining the accuracy metrics, it’s evident that the ARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.

Code
autoplot(tsi_ts, xlab = "Date", ylab = "Transportation Services Index") +
  autolayer(meanf(tsi_ts, h=12),
            series="Mean", PI=FALSE) +
  autolayer(naive(tsi_ts, h=12),
            series="Naïve", PI=FALSE)+
  autolayer(rwf(tsi_ts, h=12, drift=TRUE),
            series="Drift", PI=FALSE)+
   autolayer(pred_tsi, 
            series="ARIMA",PI=FALSE) + ggtitle("U.S. Freight Transportation Services Index ARIMA vs. Benchmarks")+
  guides(colour=guide_legend(title="Forecast")) +theme_bw()

We observe that the ARIMA model best captures the trend, closely aligning with the observed data. The drift methods exhibit a comparable trajectory for the forecast. In contrast, the Mean and Naive benchmark models notably underperform, failing to accurately capture the underlying trend.

Code
# Compute accuracy
output1 <- accuracy(pred_tsi)
output2 <- accuracy(meanf(tsi_ts, h=12))
output3 <- accuracy(naive(tsi_ts, h=12))
output4 <- accuracy(rwf(tsi_ts, drift=TRUE, h=12))

output_list <- list(output1, output2, output3, output4)

# Put into df
output_df <- do.call(rbind, output_list)

# Set row names for the dataframe
row.names(output_df) <- c("ARIMA", "Meanf", "Naive", "Drift")

output_df
                 ME      RMSE       MAE         MPE      MAPE      MASE
ARIMA  7.283871e-04  1.475506  1.068367 -0.01205377 0.9267941 0.2634875
Meanf -1.056836e-14 13.009848 11.112711 -1.22065210 9.5135777 2.7406872
Naive  1.141304e-01  1.500374  1.096014  0.08659501 0.9496963 0.2703060
Drift  2.500415e-15  1.496027  1.087839 -0.01244364 0.9430018 0.2682897
              ACF1
ARIMA  0.004977272
Meanf  0.987698495
Naive -0.064131500
Drift -0.064131500

Upon examining the accuracy metrics, it’s evident that the ARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.

Code
autoplot(air_ts, xlab = "Date", ylab = "Freight revenue per ton-mile (current cents)") +
  autolayer(meanf(air_ts, h=5),
            series="Mean", PI=FALSE) +
  autolayer(naive(air_ts, h=5),
            series="Naïve", PI=FALSE)+
  autolayer(rwf(air_ts, h=5, drift=TRUE),
            series="Drift", PI=FALSE)+
   autolayer(pred_air, 
            series="ARIMA",PI=FALSE) + ggtitle("U.S. Domestic Air Carrier Average Freight Revenue ARIMA vs. Benchmarks")+
  guides(colour=guide_legend(title="Forecast")) +theme_bw()

We observe that the ARIMA model seems underperform, failing to accurately capture the underlying trend.

Code
# Compute accuracy
output1 <- accuracy(pred_air)
output2 <- accuracy(meanf(air_ts, h=12))
output3 <- accuracy(naive(air_ts, h=12))
output4 <- accuracy(rwf(air_ts, drift=TRUE, h=12))

output_list <- list(output1, output2, output3, output4)

# Put into df
output_df <- do.call(rbind, output_list)

# Set row names for the dataframe
row.names(output_df) <- c("ARIMA", "Meanf", "Naive", "Drift")

output_df
                 ME     RMSE       MAE          MPE      MAPE      MASE
ARIMA  2.965719e-01 11.79289  8.992931   0.18889448  9.662775 0.8915005
Meanf -2.306236e-15 31.95599 29.965043 -12.98493762 35.854642 2.9705387
Naive  9.865931e-01 14.61164 10.087410  -0.04930048 11.202702 1.0000000
Drift -3.383537e-15 14.57830  9.911415  -1.18896301 10.999729 0.9825529
            ACF1
ARIMA -0.2644170
Meanf  0.8911918
Naive  0.2602701
Drift  0.2602701

Upon examining the accuracy metrics, it’s evident that the ARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.

Code
autoplot(ups_ts, xlab = "Date", ylab = "Stock Price") +
  autolayer(meanf(ups_ts, h=50),
            series="Mean", PI=FALSE) +
  autolayer(naive(ups_ts, h=50),
            series="Naïve", PI=FALSE)+
  autolayer(rwf(ups_ts, h=50, drift=TRUE),
            series="Drift", PI=FALSE)+
   autolayer(pred_ups, 
            series="ARIMA",PI=FALSE) + ggtitle("UPS Stock Price ARIMA vs. Benchmarks")+
  guides(colour=guide_legend(title="Forecast")) +theme_bw()

We observe that the ARIMA, drift and Naive models exhibit a comparable trajectory for the forecast and best captures the trend, closely aligning with the observed data. In contrast, the Mean benchmark model notably underperform, failing to accurately capture the underlying trend.

Code
# Compute accuracy
output1 <- accuracy(pred_ups)
output2 <- accuracy(meanf(ups_ts, h=12))
output3 <- accuracy(naive(ups_ts, h=12))
output4 <- accuracy(rwf(ups_ts, drift=TRUE, h=12))

output_list <- list(output1, output2, output3, output4)

# Put into df
output_df <- do.call(rbind, output_list)

# Set row names for the dataframe
row.names(output_df) <- c("ARIMA", "Meanf", "Naive", "Drift")

output_df
                 ME      RMSE       MAE          MPE      MAPE       MASE
ARIMA  5.174000e-05  2.336454  1.513771  -0.01543307  1.161839 0.04821099
Meanf -2.713587e-13 40.426672 38.180111 -10.12057156 31.863702 1.21597047
Naive  3.660448e-02  2.337404  1.515062   0.01546448  1.162754 0.04825209
Drift -4.281767e-15  2.337117  1.514580  -0.01549867  1.162442 0.04823675
            ACF1
ARIMA 0.03201809
Meanf 0.99795274
Naive 0.03201824
Drift 0.03201824

Upon examining the accuracy metrics, it’s evident that the ARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.

SARIMA

ACF & PACF Plots

Code
canada_ts %>% diff(lag=12) %>% diff() %>% ggtsdisplay()

Significant changes are apparent in the seasonal differenced dataset, evident from the time series plot. Although the seasonal cycles persist in the plots for seasonal lags 1 and 2, there is a discernible decrease in correlation observed in both the ACF and PACF plots. This reduction in correlation signifies a potential enhancement in stationarity, suggesting that the seasonal differencing has effectively mitigated the seasonality inherent in the original data.

  • P: 1,2 D: 1 Q: 1,2
Code
employment_ts %>% diff(lag=12) %>% diff() %>% ggtsdisplay()

Significant changes are apparent in the seasonal differenced dataset, evident from the time series plot. Although the seasonal cycles persist in the plots for seasonal lags 1, there is a discernible decrease in correlation observed in both the ACF and PACF plots. This reduction in correlation signifies a potential enhancement in stationarity, suggesting that the seasonal differencing has effectively mitigated the seasonality inherent in the original data.

  • P: 1,2 D: 1 Q: 1
Code
tsi_ts %>% diff(lag=12) %>% diff() %>% ggtsdisplay()

Significant changes are apparent in the seasonal differenced dataset, evident from the time series plot. Although the seasonal cycles persist in the plots for seasonal lags 1 and 2, there is a discernible decrease in correlation observed in both the ACF and PACF plots. This reduction in correlation signifies a potential enhancement in stationarity, suggesting that the seasonal differencing has effectively mitigated the seasonality inherent in the original data.

  • P: 1,2 D: 1 Q: 1
Code
ups_ts %>% diff(lag=12) %>% diff() %>% ggtsdisplay(lag=40)

Significant changes are apparent in the seasonal differenced dataset, evident from the time series plot. Although the seasonal cycles persist in the plots for seasonal lags 1 and 2, there is a discernible decrease in correlation observed in both the ACF and PACF plots. This reduction in correlation signifies a potential enhancement in stationarity, suggesting that the seasonal differencing has effectively mitigated the seasonality inherent in the original data.

  • P: 1 D: 1 Q: 1

Finding Model Parameters For Selected Models

In this section, we will identify potential values for the parameters p, q, P and Q based on the significant lags observed in the PACF and ACF plots of the original data, respectively. Specifically, we will consider the most significant lags from the PACF plot for the value of p,P, and from the ACF plot for the value of q,Q.

p = 0,1,2,3,4 d = 0,1,2 q = 0,1,2,3,4, P: 1,2 D: 1, Q: 1,2

Code
SARIMA.c=function(p1,p2,q1,q2,P1,P2,Q1,Q2,d1,d2,data){
  
  temp=c()
  D=1
  s=12
  
  i=1
  temp= data.frame()
  ls=matrix(rep(NA,9*150),nrow=150)
  
  
  for (p in p1:p2)
  {
    for(q in q1:q2)
    {
      for(P in P1:P1)
      {
        for(Q in Q1:Q1)
        {
          for(d in d1:d2)
            {
          if(p+d+q+P+D+Q<=10)
          {
            
            model<- Arima(data,order=c(p,d,q),seasonal=c(P,D,Q))
            ls[i,]= c(p,d,q,P,D,Q,model$aic,model$bic,model$aicc)
            i=i+1
            #print(i)
            
          }
          
        }
      }
    }
    
  }
  }
  
  temp= as.data.frame(ls)
  names(temp)= c("p","d","q","P","D","Q","AIC","BIC","AICc")
  
  temp
  
}


output=SARIMA.c(p1=0,p2=4,q1=0,q2=2,P1=0,P2=2,Q1=0,Q2=2,d1=0,d2=2,data=canada_ts)
#output

output[which.min(output$AIC),] 
   p d q P D Q      AIC      BIC     AICc
43 4 0 2 0 1 0 999.1474 1021.986 999.7528
Code
output[which.min(output$BIC),] 
  p d q P D Q      AIC     BIC     AICc
2 0 1 0 0 1 0 1012.312 1015.57 1012.333
Code
output[which.min(output$AICc),] 
   p d q P D Q      AIC      BIC     AICc
43 4 0 2 0 1 0 999.1474 1021.986 999.7528

The model with the lowest AIC and BIC is ARIMA(4,0,2)X(0,1,0)12.

p = 0,1,2, d = 0,1,2, q = 0,1,2,3, P: 1,2, D: 1, Q: 1,

Code
SARIMA.c=function(p1,p2,q1,q2,P1,P2,Q1,Q2,d1,d2,data){
  
  temp=c()
  D=1
  s=12
  
  i=1
  temp= data.frame()
  ls=matrix(rep(NA,9*150),nrow=150)
  
  
  for (p in p1:p2)
  {
    for(q in q1:q2)
    {
      for(P in P1:P1)
      {
        for(Q in Q1:Q1)
        {
          for(d in d1:d2)
            {
          if(p+d+q+P+D+Q<=10)
          {
            
            model<- Arima(data,order=c(p,d,q),seasonal=c(P,D,Q))
            ls[i,]= c(p,d,q,P,D,Q,model$aic,model$bic,model$aicc)
            i=i+1
            #print(i)
            
          }
          
        }
      }
    }
    
  }
  }

  temp= as.data.frame(ls)
  names(temp)= c("p","d","q","P","D","Q","AIC","BIC","AICc")
  
  temp
  
}


output=SARIMA.c(p1=0,p2=2,q1=0,q2=3,P1=0,P2=2,Q1=0,Q2=1,d1=0,d2=2,data=employment_ts)
#output

output[which.min(output$AIC),] 
   p d q P D Q      AIC      BIC     AICc
35 2 1 3 0 1 0 4523.232 4543.484 4523.634
Code
output[which.min(output$BIC),] 
   p d q P D Q      AIC      BIC     AICc
35 2 1 3 0 1 0 4523.232 4543.484 4523.634
Code
output[which.min(output$AICc),] 
   p d q P D Q      AIC      BIC     AICc
35 2 1 3 0 1 0 4523.232 4543.484 4523.634

The model with the lowest AIC and BIC is ARIMA(2,1,3)X(0,1,0)12.

p = 0,1 d = 0,1, q = 0,1,2,3 P: 1,2 D: 0,1, Q: 1

Code
SARIMA.c=function(p1,p2,q1,q2,P1,P2,Q1,Q2,d1,d2,data){
  
  temp=c()
  D=0
  s=12
  
  i=1
  temp= data.frame()
  ls=matrix(rep(NA,9*150),nrow=150)
  
  
  for (p in p1:p2)
  {
    for(q in q1:q2)
    {
      for(P in P1:P1)
      {
        for(Q in Q1:Q1)
        {
          for(d in d1:d2)
            {
          if(p+d+q+P+D+Q<=10)
          {
            
            model<- Arima(data,order=c(p,d,q),seasonal=c(P,D,Q))
            ls[i,]= c(p,d,q,P,D,Q,model$aic,model$bic,model$aicc)
            i=i+1
            #print(i)
            
          }
          
        }
      }
    }
    
  }
  }

  temp= as.data.frame(ls)
  names(temp)= c("p","d","q","P","D","Q","AIC","BIC","AICc")
  
  temp
  
}


output=SARIMA.c(p1=0,p2=1,q1=0,q2=3,P1=0,P2=2,Q1=0,Q2=1,d1=0,d2=1,data=tsi_ts)
#output

output[which.min(output$AIC),] 
   p d q P D Q      AIC      BIC    AICc
16 1 1 3 0 0 0 1008.547 1026.649 1008.77
Code
output[which.min(output$BIC),] 
  p d q P D Q      AIC      BIC     AICc
2 0 1 0 0 0 0 1009.209 1012.829 1009.223
Code
output[which.min(output$AICc),] 
   p d q P D Q      AIC      BIC    AICc
16 1 1 3 0 0 0 1008.547 1026.649 1008.77

The model with the lowest AIC and BIC is ARIMA(1,1,3)X(0,0,0).

p = 0,1, d = 0,1, q = 0,1, P: 1, D: 1, Q: 1

Code
SARIMA.c=function(p1,p2,q1,q2,P1,P2,Q1,Q2,d1,d2,data){
  
  temp=c()
  D=1
  s=365
  
  i=1
  temp= data.frame()
  ls=matrix(rep(NA,9*150),nrow=150)
  
  
  for (p in p1:p2)
  {
    for(q in q1:q2)
    {
      for(P in P1:P1)
      {
        for(Q in Q1:Q1)
        {
          for(d in d1:d2)
            {
          if(p+d+q+P+D+Q<=10)
          {
            
            model<- Arima(data,order=c(p,d,q),seasonal=c(P,D,Q))
            ls[i,]= c(p,d,q,P,D,Q,model$aic,model$bic,model$aicc)
            i=i+1
            #print(i)
            
          }
          
        }
      }
    }
    
  }
  }

  temp= as.data.frame(ls)
  names(temp)= c("p","d","q","P","D","Q","AIC","BIC","AICc")
  
  temp
  
}


output=SARIMA.c(p1=0,p2=1,q1=0,q2=1,P1=0,P2=1,Q1=0,Q2=1,d1=0,d2=1,data=ups_ts)
#output

output[which.min(output$AIC),] 
  p d q P D Q      AIC      BIC     AICc
2 0 1 0 0 1 0 7409.144 7414.384 7409.147
Code
output[which.min(output$BIC),] 
  p d q P D Q      AIC      BIC     AICc
2 0 1 0 0 1 0 7409.144 7414.384 7409.147
Code
output[which.min(output$AICc),] 
  p d q P D Q      AIC      BIC     AICc
2 0 1 0 0 1 0 7409.144 7414.384 7409.147

The model with the lowest AIC and BIC is ARIMA(0,1,0)X(0,1,0)12.

Fit Best SARIMA(P,D,Q) & Diagnostics

Code
set.seed(99)
output <- capture.output(sarima(canada_ts,4,0,2,0,1,0,12))

Code
cat(output[96:108], output[length(output)], sep = "\n") 
Coefficients: 
         Estimate     SE  t.value p.value
ar1        0.6848 0.0713   9.5986  0.0000
ar2       -0.6141 0.0601 -10.2185  0.0000
ar3        0.8666 0.0611  14.1891  0.0000
ar4       -0.2061 0.0708  -2.9106  0.0040
ma1        0.3393 0.0166  20.4202  0.0000
ma2        1.0000 0.0244  41.0032  0.0000
constant   0.1101 0.1546   0.7117  0.4775

sigma^2 estimated as 9.275475 on 186 degrees of freedom 
 
AIC = 5.184752  AICc = 5.187888  BIC = 5.319993 
 

Equation: \(x_t= 0.6848*x_{t-1} -0.6141*x_{t-2} +0.8666*x_{t-3} -0.2061*x_{t-4}-0.6848*x_{t-13} +0.6141*x_{t-14} -0.8666*x_{t-15} +0.2061*x_{t-16} +w_t +0.3393*w_{t-1} +w_{t-2} +0.1101\)

Upon inspection of the time plot of the standardized residuals above, no discernible patterns are evident. However, the ACF plot of the standardized residuals indicates some remaining correlation. The normal Q-Q plot of the residuals suggests that the assumption of normality is reasonable, with the exception of potential outliers. The results of the Ljung-Box test reveal values below the 0.05 (5% significance) threshold, indicating the presence of some significant correlation remaining. All the P values are significant. Despite these findings, it appears that the model has been improved.

Code
set.seed(99)
output <- capture.output(sarima(employment_ts,2,1,3,0,1,0,12))

Code
cat(output[48:58], output[length(output)], sep = "\n") 
Coefficients: 
    Estimate     SE  t.value p.value
ar1   0.1874 0.0457   4.0975   1e-04
ar2  -0.8431 0.0411 -20.5313   0e+00
ma1   0.3593 0.0768   4.6805   0e+00
ma2   0.9952 0.0298  33.4238   0e+00
ma3   0.3723 0.0741   5.0210   0e+00

sigma^2 estimated as 66844570 on 211 degrees of freedom 
 
AIC = 20.94089  AICc = 20.94221  BIC = 21.03465 
 

\(x_t= 1.1874*x_{t-1} -1.0305*x_{t-2} +0.8431*x_{t-3} +x_{t-12}-1.1874*x_{t-13} +1.1874*x_{t-14} -0.8431*x_{t-15} +w_t +0.3593*w_{t-1} +0.9952*w_{t-2} +0.3723*w_{t-3}\)

Upon inspection of the time plot of the standardized residuals above, no discernible patterns are evident. However, the ACF plot of the standardized residuals indicates some remaining correlation. The normal Q-Q plot of the residuals suggests that the assumption of normality is reasonable, with the exception of potential outliers. The results of the Ljung-Box test reveal values below the 0.05 (5% significance) threshold, indicating the presence of some significant correlation remaining. All the P values are significant. Despite these findings, it appears that the model has been improved.

Code
set.seed(99)
output <- capture.output(sarima(tsi_ts,1,1,3,0,0,0,12))

Code
cat(output[45:55], output[length(output)], sep = "\n") 
Coefficients: 
         Estimate     SE t.value p.value
ar1        0.5867 0.2181  2.6905  0.0076
ma1       -0.6583 0.2206 -2.9849  0.0031
ma2       -0.0582 0.0737 -0.7888  0.4309
ma3        0.1708 0.0650  2.6269  0.0091
constant   0.1053 0.0975  1.0795  0.2813

sigma^2 estimated as 2.172189 on 271 degrees of freedom 
 
AIC = 3.657447  AICc = 3.658252  BIC = 3.736151 
 

Equation: \(x_t= 1.5867*x_{t-1} -0.5867*x_{t-2} +w_t -0.6583*w_{t-1} -0.0582*w_{t-2} +0.1708*w_{t-3} +0.1053\)

Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals no significant departures from the model assumptions, as no significant lags are observed. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. Furthermore, the Ljung-Box test yields favorable values exceeding 0.05, suggesting the absence of significant correlation and indicating a well-fitted model. Overall, these observations suggest that the model fits the data well.

Code
set.seed(99)
output <- capture.output(sarima(ups_ts,0,1,0,0,1,0,12))

Code
cat(output, output[length(output)], sep = "\n") 
<><><><><><><><><><><><><><>
 
Coefficients: 
     Estimate p.value

sigma^2 estimated as 11.05997 on 1747 degrees of freedom 
 
AIC = 5.242354  AICc = 5.242354  BIC = 5.245482 
 
 

Equation: \(x_t= x_{t-1} + x_{t-12} -x_{t-13} +w_t\)

Upon examining the time plot of the standardized residuals, no discernible patterns are evident. Additionally, the ACF plot of the standardized residuals reveals just one significant departures from the model assumptions. The normal Q-Q plot of the residuals indicates that the assumption of normality is reasonable, with the exception of possible outliers. The results of the Ljung-Box test reveal values below the 0.05 (5% significance) threshold, indicating the presence of some significant correlation remaining. Despite these findings, it appears that the model has been improved.

Auto ARIMA

Code
auto.arima(canada_ts)
Series: canada_ts 
ARIMA(2,1,5)(1,0,0)[12] 

Coefficients:
          ar1      ar2     ma1     ma2     ma3      ma4     ma5    sar1
      -0.1410  -0.4216  0.1452  0.4440  0.0703  -0.3019  0.2020  0.5636
s.e.   0.1988   0.2128  0.1977  0.2187  0.0836   0.0907  0.1092  0.0718

sigma^2 = 8.695:  log likelihood = -508.85
AIC=1035.71   AICc=1036.63   BIC=1065.57

The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.

Code
auto.arima(employment_ts)
Series: employment_ts 
ARIMA(0,1,1) 

Coefficients:
         ma1
      0.4204
s.e.  0.0536

sigma^2 = 41251784:  log likelihood = -2322.12
AIC=4648.25   AICc=4648.3   BIC=4655.11

The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.

Code
auto.arima(tsi_ts)
Series: tsi_ts 
ARIMA(3,1,0)(0,0,2)[12] with drift 

Coefficients:
          ar1      ar2     ar3     sma1     sma2   drift
      -0.0787  -0.0820  0.0791  -0.1773  -0.1487  0.1261
s.e.   0.0605   0.0614  0.0623   0.0634   0.0655  0.0560

sigma^2 = 2.129:  log likelihood = -493.41
AIC=1000.82   AICc=1001.24   BIC=1026.16

The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.

Code
auto.arima(ups_ts)
Series: ups_ts 
ARIMA(0,1,0) 

sigma^2 = 5.463:  log likelihood = -3989.38
AIC=7980.75   AICc=7980.75   BIC=7986.22

The discrepancy between the best model provided by Auto ARIMA and the best model identified in the previous step raises concerns about its reliability. Auto ARIMA solely relies on minimizing AIC/BIC values without considering lag instances of high correlation as reflected in the ACF/PACF plots. This approach may result in overfitting and may not capture the underlying patterns in the data accurately.

Forecasting

Code
fit_canada <- Arima(canada_ts, order=c(4,0,2), seasonal=c(0,1,0))

# forecast next three years
fit_canada %>% forecast(h=12) %>% autoplot(xlab = "Date", ylab = "Billions", colour = "blue")+ggtitle('U.S.-Canada Freight Value Forecast')+theme_bw() 

The blue line represents the forecast, while the two shades of purple depict the confidence intervals, with the darker shade representing the 95% interval and the lighter shade representing the 5% interval. As the confidence interval widens, indicating greater uncertainty in the forecast, the shaded region becomes wider.

Examining the forecast plot, there is an initial surge in the monthly U.S.-Canada freight value, followed by a subsequent decline and another increase. Comparing the observed increasing seasonal fluctuations post-2020 to the forecasted fluctuations, it appears that the model accurately predicted the anticipated pattern.

Code
fit_employment <- Arima(employment_ts, order=c(2,1,3), seasonal=c(0,1,0))

# forecast next three years
fit_employment %>% forecast(h=12) %>% autoplot(xlab = "Date", ylab = "Employment", colour = "blue")+ggtitle('U.S. Air Transportation Employment Forecast') +theme_bw() 

The forecast plot illustrates a subtle decreasing trend in air transportation employment over the next 12 months. The colored bands depict the 95% confidence interval. However, the forecast appears overly smooth and lacks fluctuations. This suggests that the model may not adequately capture the inherent variability or fluctuations in the data.

Code
fit_tsi <- Arima(tsi_ts, order=c(1,1,3), seasonal=c(0,0,0))

# forecast next three years
fit_tsi %>% forecast(h=12) %>% autoplot( xlab = "Date", ylab = "Transportation Services Index", colour = "blue")+ggtitle('U.S. Freight Transportation Services Index Forecast') +theme_bw()

The forecast plot indicates an ascending trend in the U.S. freight TSI over the ensuing 12 months. The colored bands delineate the 95% confidence interval. Nonetheless, the forecast appears excessively smooth, lacking the expected fluctuations. This suggests that the model might not adequately capture the inherent variability or fluctuations in the data.

Code
fit_ups <- Arima(ups_ts, order=c(0,1,0), seasonal=c(0,1,0))

# forecast next three years
fit_ups %>% forecast(h=50) %>% autoplot( xlab = "Date", ylab = "Stock Price", colour = "blue")+ggtitle('UPS Stock Price Forecast') +theme_bw()

The forecast plot depicts a rising trend in the UPS stock price over the next 50 days. The colored bands represent the 95% confidence interval. Notably, it closely follows the fluctuation pattern observed in the historical data. This alignment suggests that the model may effectively capture the inherent variability and fluctuations present in the stock price data.

SARIMA vs. Benchmark Methods

Code
autoplot(canada_ts, xlab = "Date", ylab = "Billions") +
  autolayer(meanf(canada_ts, h=12),
            series="Mean", PI=FALSE) +
  autolayer(naive(canada_ts, h=12),
            series="Naïve", PI=FALSE)+
  autolayer(snaive(canada_ts, h=12),
            series="SNaïve", PI=FALSE)+
  autolayer(rwf(canada_ts, h=12, drift=TRUE),
            series="Drift", PI=FALSE)+
   autolayer(forecast(fit_canada,12), 
            series="SARIMA",PI=FALSE) + ggtitle("U.S.-Canada Freight Value SARIMA vs. Benchmarks")+
  guides(colour=guide_legend(title="Forecast")) +theme_bw()

We observe that the SARIMA model best captures the trend, closely aligning with the observed data. The SNaive methods exhibit a comparable trajectory for the forecast. In contrast, the Mean and Naive benchmark models notably underperform, failing to accurately capture the underlying trend.

Code
# Compute accuracy
output1 <- accuracy(forecast(fit_canada,12))
output2 <- accuracy(meanf(canada_ts, h=12))
output3 <- accuracy(naive(canada_ts, h=12))
output4 <- accuracy(snaive(canada_ts, h=12))
output5 <- accuracy(rwf(canada_ts, drift=TRUE, h=12))

output_list <- list(output1, output2, output3, output4, output5)

# Put into df
output_df <- do.call(rbind, output_list)

# Set row names for the dataframe
row.names(output_df) <- c("SARIMA", "Meanf", "Naive", "SNaive", "Drift")

output_df
                  ME     RMSE      MAE         MPE      MAPE      MASE
SARIMA  1.593321e-01 2.958692 2.117014  0.09347661  4.440010 0.3718570
Meanf  -1.819683e-15 7.158385 5.209884 -2.19842282 11.014570 0.9151249
Naive   8.439239e-02 3.747662 2.676507 -0.15926651  5.543406 0.4701329
SNaive  1.362544e+00 7.718869 5.693085  1.00899209 12.230962 1.0000000
Drift  -2.088947e-15 3.746711 2.677095 -0.33383993  5.547840 0.4702363
               ACF1
SARIMA  0.005197202
Meanf   0.855872956
Naive  -0.270427328
SNaive  0.902091441
Drift  -0.270427328

Upon examining the accuracy metrics, it’s evident that the SARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the SARIMA model outperforms the benchmark models in terms of accuracy.

Code
autoplot(employment_ts, xlab = "Date", ylab = "Employment") +
  autolayer(meanf(employment_ts, h=12),
            series="Mean", PI=FALSE) +
  autolayer(naive(employment_ts, h=12),
            series="Naïve", PI=FALSE)+
  autolayer(snaive(employment_ts, h=12),
            series="SNaïve", PI=FALSE)+
  autolayer(rwf(employment_ts, h=12, drift=TRUE),
            series="Drift", PI=FALSE)+
   autolayer(forecast(fit_employment,12), 
            series="SARIMA",PI=FALSE) + ggtitle("U.S. Air Transportation Employment SARIMA vs. Benchmarks")+
  guides(colour=guide_legend(title="Forecast")) +theme_bw()

We observe that the SARIMA model best captures the trend, closely aligning with the observed data. The SNaive methods exhibit a comparable trajectory for the forecast. In contrast, the Mean benchmark model notably underperform, failing to accurately capture the underlying trend.

Code
# Compute accuracy
output1 <- accuracy(forecast(fit_employment,12))
output2 <- accuracy(meanf(employment_ts, h=12))
output3 <- accuracy(naive(employment_ts, h=12))
output4 <- accuracy(snaive(employment_ts, h=12))
output5 <- accuracy(rwf(employment_ts, drift=TRUE, h=12))

output_list <- list(output1, output2, output3, output4, output5)

# Put into df
output_df <- do.call(rbind, output_list)

# Set row names for the dataframe
row.names(output_df) <- c("SARIMA", "Meanf", "Naive", "SNaive", "Drift")

output_df
                  ME      RMSE       MAE          MPE      MAPE      MASE
SARIMA -4.505618e+01  7941.374  3598.365 -0.004728991 0.7711819 0.1865997
Meanf  -2.235212e-11 25101.751 21628.241 -0.286790486 4.5989461 1.1215716
Naive  -8.421053e+01  7028.726  3175.439 -0.029090287 0.6871776 0.1646681
SNaive  0.000000e+00 28713.604 19283.871 -0.196309220 4.1647999 1.0000000
Drift  -4.059817e-12  7028.221  3184.303 -0.011323704 0.6889589 0.1651278
              ACF1
SARIMA -0.01374762
Meanf   0.95752164
Naive   0.37633324
SNaive  0.93334964
Drift   0.37633324

Upon examining the accuracy metrics, it’s evident that the Naive and drift model achieved the lowest RMSE, MAE, and MAPE values, followed by the SARIMA model. This suggests that the SARIMA model did not outperforms the benchmark models in terms of accuracy.

Code
autoplot(tsi_ts, xlab = "Date", ylab = "Transportation Services Index") +
  autolayer(meanf(tsi_ts, h=12),
            series="Mean", PI=FALSE) +
  autolayer(naive(tsi_ts, h=12),
            series="Naïve", PI=FALSE)+
  autolayer(snaive(tsi_ts, h=12),
            series="SNaïve", PI=FALSE)+
  autolayer(rwf(tsi_ts, h=12, drift=TRUE),
            series="Drift", PI=FALSE)+
   autolayer(forecast(fit_tsi,12), 
            series="SARIMA",PI=FALSE) + ggtitle("U.S. Freight Transportation Services Index SARIMA vs. Benchmarks")+
  guides(colour=guide_legend(title="Forecast")) +theme_bw()

We observe that the SARIMA model best captures the trend, closely aligning with the observed data. The SNaive methods exhibit a comparable trajectory for the forecast. In contrast, the Mean and Naive benchmark models notably underperform, failing to accurately capture the underlying trend.

Code
# Compute accuracy
output1 <- accuracy(forecast(fit_tsi,12))
output2 <- accuracy(meanf(tsi_ts, h=12))
output3 <- accuracy(naive(tsi_ts, h=12))
output4 <- accuracy(snaive(tsi_ts, h=12))
output5 <- accuracy(rwf(tsi_ts, drift=TRUE, h=12))

output_list <- list(output1, output2, output3, output4, output5)

# Put into df
output_df <- do.call(rbind, output_list)

# Set row names for the dataframe
row.names(output_df) <- c("SARIMA", "Meanf", "Naive", "SNaive", "Drift")

output_df
                  ME      RMSE       MAE         MPE      MAPE      MASE
SARIMA  9.533649e-02  1.474058  1.070741  0.07317726 0.9278111 0.2640729
Meanf  -1.056836e-14 13.009848 11.112711 -1.22065210 9.5135777 2.7406872
Naive   1.141304e-01  1.500374  1.096014  0.08659501 0.9496963 0.2703060
SNaive  1.778491e+00  5.242155  4.054717  1.39387516 3.5019130 1.0000000
Drift   2.500415e-15  1.496027  1.087839 -0.01244364 0.9430018 0.2682897
               ACF1
SARIMA -0.002323126
Meanf   0.987698495
Naive  -0.064131500
SNaive  0.889680535
Drift  -0.064131500

Upon examining the accuracy metrics, it’s evident that the SARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the SARIMA model outperforms the benchmark models in terms of accuracy.

Code
autoplot(ups_ts, xlab = "Date", ylab = "Stock Price") +
  autolayer(meanf(ups_ts, h=50),
            series="Mean", PI=FALSE) +
  autolayer(naive(ups_ts, h=50),
            series="Naïve", PI=FALSE)+
  autolayer(snaive(ups_ts, h=50),
            series="SNaïve", PI=FALSE)+
  autolayer(rwf(ups_ts, h=50, drift=TRUE),
            series="Drift", PI=FALSE)+
   autolayer(forecast(fit_ups,50), 
            series="SARIMA",PI=FALSE) + ggtitle("UPS Stock Price SARIMA vs. Benchmarks")+
  guides(colour=guide_legend(title="Forecast")) +theme_bw()

We observe that the SARIMA model best captures the trend, closely aligning with the observed data. The SNaive methods exhibit a comparable trajectory for the forecast. In contrast, the Mean and Naive benchmark models notably underperform, failing to accurately capture the underlying trend.

Code
# Compute accuracy
output1 <- accuracy(pred_ups)
output2 <- accuracy(meanf(ups_ts, h=50))
output3 <- accuracy(naive(ups_ts, h=50))
output4 <- accuracy(snaive(ups_ts, h=50))
output5 <- accuracy(rwf(ups_ts, drift=TRUE, h=50))

output_list <- list(output1, output2, output3, output4, output5)

# Put into df
output_df <- do.call(rbind, output_list)

# Set row names for the dataframe
row.names(output_df) <- c("SARIMA", "Meanf", "Naive", "SNaive", "Drift")

output_df
                  ME      RMSE       MAE          MPE      MAPE       MASE
SARIMA  5.174000e-05  2.336454  1.513771  -0.01543307  1.161839 0.04821099
Meanf  -2.713587e-13 40.426672 38.180111 -10.12057156 31.863702 1.21597047
Naive   3.660448e-02  2.337404  1.515062   0.01546448  1.162754 0.04825209
SNaive  2.000856e+01 42.830677 31.480384  11.47406485 20.258650 1.00259578
Drift  -4.281767e-15  2.337117  1.514580  -0.01549867  1.162442 0.04823675
             ACF1
SARIMA 0.03201809
Meanf  0.99795274
Naive  0.03201824
SNaive 0.99551308
Drift  0.03201824

Upon examining the accuracy metrics, it’s evident that the SARIMA model achieved the lowest RMSE, MAE, and MAPE values, followed by the Drift model. This suggests that the ARIMA model outperforms the benchmark models in terms of accuracy.

Cross Validation

Code
#a seasonal cross validation using 1 steps ahead forecasts
farima1 <- function(x, h){forecast(Arima(x, order=c(4,0,2),seasonal = c(0,1,0)),h=h)}

# Compute cross-validated errors for up to 1 steps ahead
e1 <- tsCV(canada_ts, forecastfunction = farima1, h = 1)
mae1 <-abs(mean(e1,na.rm=TRUE))
rmse1=sqrt(mean(e1^2, na.rm=TRUE)) #one-step time series cross-validation
cat("1-Step Forecast MAE:", mae1, "\n")
1-Step Forecast MAE: 0.1707379 
Code
cat("1-Step Forecast RMSE:", rmse1, "\n")
1-Step Forecast RMSE: 3.525084 
Code
# Compute cross-validated errors for up to 12 steps ahead
e <- tsCV(canada_ts, forecastfunction = farima1, h = 12)

# Compute the AME and RMSE values
mae <-abs(mean(e,na.rm=TRUE))
rmse <-sqrt(mean(e^2, na.rm=TRUE)) #s-step time series cross-validation
cat("S-Step Forecast MAE:", mae, "\n")
S-Step Forecast MAE: 0.6267446 
Code
cat("S-Step Forecast RMSE:", rmse, "\n")
S-Step Forecast RMSE: 7.491664 
Code
#a seasonal cross validation using 1 steps ahead forecasts
farima1 <- function(x, h){forecast(Arima(x, order=c(2,1,3),seasonal = c(0,1,0)),h=h)}

# Compute cross-validated errors for up to 1 steps ahead
e1 <- tsCV(employment_ts, forecastfunction = farima1, h = 1)
mae1 <-abs(mean(e1,na.rm=TRUE))
rmse1=sqrt(mean(e1^2, na.rm=TRUE)) #one-step time series cross-validation
cat("1-Step Forecast MAE:", mae1, "\n")
1-Step Forecast MAE: 262.7295 
Code
cat("1-Step Forecast RMSE:", rmse1, "\n")
1-Step Forecast RMSE: 9779.526 
Code
# Compute cross-validated errors for up to 12 steps ahead
e <- tsCV(employment_ts, forecastfunction = farima1, h = 12)

# Compute the AME and RMSE values
mae <-abs(mean(e,na.rm=TRUE))
rmse <-sqrt(mean(e^2, na.rm=TRUE)) #s-step time series cross-validation
cat("S-Step Forecast MAE:", mae, "\n")
S-Step Forecast MAE: 940.5674 
Code
cat("S-Step Forecast RMSE:", rmse, "\n")
S-Step Forecast RMSE: 33433.24 
Code
#a seasonal cross validation using 1 steps ahead forecasts
farima1 <- function(x, h){forecast(Arima(x, order=c(1,1,3),seasonal = c(0,0,0)),h=h)}

# Compute cross-validated errors for up to 1 steps ahead
e1 <- tsCV(tsi_ts, forecastfunction = farima1, h = 1)
mae1 <-abs(mean(e1,na.rm=TRUE))
rmse1=sqrt(mean(e1^2, na.rm=TRUE)) #one-step time series cross-validation
cat("1-Step Forecast MAE:", mae1, "\n")
1-Step Forecast MAE: 0.1098334 
Code
cat("1-Step Forecast RMSE:", rmse1, "\n")
1-Step Forecast RMSE: 1.548448 
Code
# Compute cross-validated errors for up to 12 steps ahead
e <- tsCV(tsi_ts, forecastfunction = farima1, h = 12)

# Compute the AME and RMSE values
mae <-abs(mean(e,na.rm=TRUE))
rmse <-sqrt(mean(e^2, na.rm=TRUE)) #s-step time series cross-validation
cat("S-Step Forecast MAE:", mae, "\n")
S-Step Forecast MAE: 0.8662205 
Code
cat("S-Step Forecast RMSE:", rmse, "\n")
S-Step Forecast RMSE: 3.856692 
Code
#a seasonal cross validation using 1 steps ahead forecasts
#farima1 <- function(x, h){forecast(Arima(x, order=c(0,1,0),seasonal = c(0,1,0)),h=h)}

# Compute cross-validated errors for up to 1 steps ahead
#e1 <- tsCV(ups_ts, forecastfunction = farima1, h = 1)
#mae1 <-abs(mean(e1,na.rm=TRUE))
#rmse1=sqrt(mean(e1^2, na.rm=TRUE)) #one-step time series cross-validation
#cat("1-Step Forecast MAE:", mae1, "\n")
#cat("1-Step Forecast RMSE:", rmse1, "\n")

# Compute cross-validated errors for up to 365 steps ahead
#e <- tsCV(ups_ts, forecastfunction = farima1, h = 365)

# Compute the AME and RMSE values
#mae <-abs(mean(e,na.rm=TRUE))
#rmse <-sqrt(mean(e^2, na.rm=TRUE)) #s-step time series cross-validation
#cat("S-Step Forecast MAE:", mae, "\n")
#cat("S-Step Forecast RMSE:", rmse, "\n")

The computation exceeds the runtime limit, preventing me from generating the output.

Back to top